Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing
نویسندگان
چکیده
Human perceptions of speaker characteristics, needed to perform automatic predictions from speech features, have generally been collected by conducting demanding in-lab listening tests under controlled conditions. Concurrently, crowdsourcing has emerged as a valuable approach for running user studies through surveys or quantitative ratings. Micro-task crowdsourcing markets enable the completion of small tasks (commonly of minutes or seconds), rewarding users with micro-payments. This paradigm permits effortless collection of user input from a large and diverse pool of participants at low cost. This paper presents different auditory tests for collecting perceptual voice likability ratings employing a common set of 30 male and female voices. These tests are based on direct scaling and on paired-comparisons, and were conducted in the laboratory and via crowdsourcing using micro-tasks. Design considerations are proposed for adapting the laboratory listening tests to a mobile-based crowdsourcing platform to obtain trustworthy listeners’ answers. Our likability scores obtained by the different test approaches are highly correlated. This outcome motivates the use of crowdsourcing for future listening tests investigating e.g. speaker characterization, reducing the efforts involved in engaging participants and administering the tests on-site.
منابع مشابه
Pair-Comparison for Collecting Voice Likability Ratings: Laboratory vs. Crowdsourcing
Crowdsourcing has established itself as a powerful tool being currently adopted in multiple domains as a means to collect human input for data acquisition and labelling. Experiments conventionally executed in a laboratory setup can now be addressed to a wider audience while controlling its diversity. However, it remains the question of whether the crowdsourcing outcomes are valid and reliable, ...
متن کاملIntroducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
Crowdsourcing is an arising collaborative approach applicable among many other applications to the area of language and speech processing. In fact, the use of crowdsourcing was already applied in the field of speech processing with promising results. However, only few studies investigated the use of crowdsourcing in computational paralinguistics. In this contribution, we propose a novel evaluat...
متن کاملLarge-Scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings
Speech quality and likability is a multi-faceted phenomenon consisting of a combination of perceptory features that cannot easily be computed nor weighed automatically. Yet, it is often easy to decide which of two voices one likes better, even though it would be hard to describe why, or to name the underlying basic perceptory features. Although likability is inherently subjective and individual...
متن کاملWhich Synthetic Voice Should I Choose for an Evocative Task?
We explore different evaluation methods for 4 different synthetic voices and 1 human voice. We investigate whether intelligibility, naturalness, or likability of a voice is correlated to the voice’s evocative function potential, a measure of the voice’s ability to evoke an intended reaction from the listener. We also investigate the extent to which naturalness and likability ratings vary depend...
متن کاملVoice attributes affecting likability perception
Ratings of voices’ likability were collected in two successive studies. A single scale seems to be sufficient for assessing such ratings. Based on limited but controlled data, spectral parameters as well as f0 and articulation rate correlate with the ratings obtained. An automatic classification confirms the relevance of spectral features for the perception of likability. As a simple method of ...
متن کامل